class: center, middle, inverse, title-slide # FIN7028: Times Series Financial Econometrics 7 ## Linear time series models ### Barry Quinn ### 2022-03-04 --- layout: true <div class="my-footer"> <span> Barry Quinn </span> </div> --- --- class: middle # Learning outcomes .large[ - Stationary and differencing - Modelling stationary time series - ARIMA models - Forecasting at scale: Prophet revisited ] --- class: middle # Stationarity and differencing * The foundation of statistical inference in time series analysis is the concept of weak stationarity. A stationary series": * roughly horizontal * constant variance * no patterns predictable in the long-term --- class: middle .your-turn[ Is these financial time series stationary? .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] ] --- class: middle .pull-left[ #### Inference and stationarity * The monthly log returns of Russell 2000 index vary around zero over time. * If we divide up the data into subperiods we would expect each sample mean to be roughly zero. * Furthermore, expect the recent financial crisis (2007-2009), the log returns range is approximately [-0.2,0.2]. * Statistically, the mean and the variance are constant over time OR time invariant. * Put together these to time invariant properties characterise a weakly stationary series. ] .pull-right[ #### Weak stationarity and prediction * Weak form stationarity provides a basic framework for prediction. * For the monthly log returns of the Russell 2000 we can predict with reasonable confidence: * Future monthly returns `\(\approx0\)` and vary `\([-0.2,0.2]\)` ] --- class: middle ## Inference and nonstationarity * Consider quarterly earnings for Carnival Plc. * If the timespan is divided into subperiods the sample mean and variance for each period show increasing pattern. * Earnings are **not** weakly stationary. * There does exist models and methods for modelling such nonstationary series. * ## Your turn: Stationary? <!-- --> --- class: middle ## Non-stationarity in the mean \structure{Identifying non-stationary series} * time plot. * The ACF of stationary data drops to zero relatively quickly * The ACF of non-stationary data decreases slowly. * For non-stationary data, the value of `\(r_1\)` is often large and positive. ## Example: FTSE index <!-- --> ## Example: FTSE index <!-- --> ## Example: FTSE index <!-- --> ## Example: FTSE index <!-- --> --- class: middle ## Differencing * Differencing helps to **stabilize the mean**. * The differenced series is the *change* between each observation in the original series: `\({y'_t = y_t - y_{t-1}}\)`. * The differenced series will have only `\(T-1\)` values since it is not possible to calculate a difference `\(y_1'\)` for the first observation. --- class: middle ## carnival earnings ending 2010 Q1 <!-- --> ## log Carnival earnings <!-- --> ## log Carnival earnings seasonally differenced ```r window(carnival_eps_ts,end=c(2010,1)) %>% log() %>% diff(lag=4) %>% autoplot() ``` <img src="data:image/png;base64,#07.linear_time_series_models_files/figure-html/carnival3-1.png" height="60%" /> ## l log carnival earnings differenced twice ```r window(carnival_eps_ts,end=c(2010,1)) %>% log() %>% diff(lag=4) %>% diff(lag=1) %>% autoplot() ``` <img src="data:image/png;base64,#07.linear_time_series_models_files/figure-html/carnival4-1.png" height="60%" /> --- class: middle ## Carnival earnings * Seasonally differenced series is closer to being stationary. * Remaining non-stationarity can be removed with further first difference. If `\(y'_t = y_t - y_{t-12}\)` denotes seasonally differenced series, then twice-differenced series i #### Seasonal differencing When both seasonal and first differences are applied\dots\pause * it makes no difference which is done first the result will be the same. * If seasonality is strong, we recommend that seasonal differencing be done first because sometimes the resulting series will be stationary and there will be no need for further first difference. * It is important that if differencing is used, the differences are interpretable. --- class: middle ## Interpretation of differencing * first differences are the change between **one observation and the next**; * seasonal differences are the change between **one year to the next**. * But taking lag 3 differences for yearly data, for example, results in a model which cannot be sensibly interpreted. --- class: middle ## Unit root tests >Statistical tests to determine the required order of differencing 1. Augmented Dickey Fuller test: null hypothesis is that the data are non-stationary and non-seasonal. 2. Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: null hypothesis is that the data are stationary and non-seasonal. 3. Other tests available for seasonal data. ## KPSS test \fontsize{10}{11}\sf ```r library(urca) summary(ur.kpss(ftse_m_ts)) ``` ``` ## ## ####################### ## # KPSS Unit Root Test # ## ####################### ## ## Test is of type: mu with 3 lags. ## ## Value of test-statistic is: 0.3983 ## ## Critical value for a significance level of: ## 10pct 5pct 2.5pct 1pct ## critical values 0.347 0.463 0.574 0.739 ``` ```r ndiffs(ftse_m_ts) ``` ``` ## [1] 1 ``` ## Automatically selecting differences STL decomposition: `\(y_t = T_t+S_t+R_t\)` Seasonal strength `\(F_s = \max\big(0, 1-\frac{\text{Var}(R_t)}{\text{Var}(S_t+R_t)}\big)\)` If `\(F_s > 0.64\)`, do one seasonal difference. \pause\fontsize{10}{15}\sf\vspace*{1cm} ```r carnival_eps_ts %>% log() %>% nsdiffs() ``` ``` ## [1] 1 ``` ```r carnival_eps_ts %>% log() %>% diff(lag=4) %>% ndiffs() ``` ``` ## [1] 0 ``` --- class: middle ## Non-seasonal ARIMA models #### Autoregressive models * When `\(y_t\)` has a statistically significant lag-1 autocorrelation, the lagged value `\(y_{t-1}\)` might be a useful in predicting `\(y_t\)`. * AR(1) model $$ y_{t}= c+\phi_{1}y_{t - 1} + \varepsilon_{t},$$ where `\(\varepsilon_t\)` is white noise. This is a simple linear regression with **lagged values** of `\(y_t\)` as predictors. * This simple model is widely used in stochastic volatility when `\(y_t\)` is replaced by its log volatility. --- class: middle ## Autoregressive models * More generally, if the `\(E(y_{t-1})\)` is determined by more than lag-1 we can generalise a AR(1) to an AR(p) model. .blocquote[ Autoregressive (AR) models: $$ y_{t}= c+\phi_{1}y_{t - 1}+\phi_{2}y_{t - 2} + \cdots+\phi_{p}y_{t - p} + \varepsilon_{t},$$ where `\(\varepsilon_t\)` is white noise. This is a multiple linear regression with **lagged values** of `\(y_t\)` as predictors. ] --- class: middle # Example of an AR(1) model .panelset[ .panel[ .panel-name[Simulating an AR(1)] .pull-left[ * Simulating an `\(y_{t} =2 -0.8 y_{t - 1}+\varepsilon_{t}\)` * where `\(\varepsilon_t\sim N(0,1)\)` for `\(T=100\)`. ] .pull-right[ <!-- --> ] ] .panel[ .panel-name[Simulating an AR(2)] .pull-left[ * Simulating an `\(y_t = 8 + 1.3y_{t-1} - 0.7 y_{t-2} + \varepsilon_t\)` * where `\(\varepsilon_t\sim N(0,1)\)` for `\(T=100\)`. ] .pull-right[ <!-- --> ] ] .panel[ .panel-name[AR(1) models explained] >$y_{t}=c + \phi_1 y_{t -1}+\varepsilon_{t}$ * When `\(\phi_1=0\)`, `\(y_t\)` is **equivalent to White Noise** * When `\(\phi_1=1\)` and `\(c=0\)`, `\(y_t\)` is **equivalent to a Random Walk** * When `\(\phi_1=1\)` and `\(c\ne0\)`, `\(y_t\)` is **equivalent to a Random Walk with drift** * When `\(\phi_1<0\)`, `\(y_t\)` tends to **oscillate between positive and negative values**. ] ] --- class: middle ## Moving Average (MA) models .blockquote[ #### Moving Average (MA) models: `$$y_{t} = c + \varepsilon_t + \theta_{1}\varepsilon_{t - 1} + \theta_{2}\varepsilon_{t - 2} + \cdots + \theta_{q}\varepsilon_{t - q},$$` - where `\(\varepsilon_t\)` is white noise. - This is a multiple regression with **past errors** as predictors. **Don't confuse this with moving average smoothing!** ] <!-- --> --- class: middle .hand-large[Putting it all together] ### ARIMA models - Autoregressive Integrated Moving Average models .blockquote[ Autoregressive Moving Average models $$$ \begin{align*} y_{t} = c+ \phi_{1}y_{t - 1} +\cdots +\phi_{p}y_{t-p} \\ \theta_{1}\varepsilon_{t - 1} + \cdots +\theta_{q}\varepsilon_{t-q} +\varepsilon_{t}. \end{align*} $$$ ] -- * Predictors include both **lagged values of `\(y_t\)` and lagged errors.** * Conditions on coefficients ensure stationarity. * Conditions on coefficients ensure invertibility. * Combine ARMA model with **differencing**. --- class: middle ## ARIMA models notation .pull-left[ >Autoregressive Integrated Moving Average models - ARIMA(p, d, q) model - AR: p = order of the autoregressive part - I: d = degree of first differencing involved - MA: q = order of the moving average part. ] .pull-right[ * White noise model: ARIMA(0,0,0) * Random walk: ARIMA(0,1,0) with no constant * Random walk with drift: ARIMA(0,1,0) with `constant term` * AR($p$): ARIMA($p$,0,0) * MA($q$): ARIMA(0,0,$q$) ] * ARIMA(1,1,1) model: $$y_t = c + y_{t-1} + \phi_1 y_{t-1}- \phi_1 y_{t-2} + \theta_1\varepsilon_{t-1} + \varepsilon_t $$ --- class: middle ## ARIMA modelling of macroeconomic time series .panelset[ .panel[ .panel-name[The data] <!-- --> .panel[ .panel-name[fit an ARIMA(2,0,2)] ```r ((fit <- arima(uschange[,"Consumption"],order = c(2,0,2)))) ``` ``` ## ## Call: ## arima(x = uschange[, "Consumption"], order = c(2, 0, 2)) ## ## Coefficients: ## ar1 ar2 ma1 ma2 intercept ## 1.3908 -0.5813 -1.1800 0.5584 0.7463 ## s.e. 0.2553 0.2078 0.2381 0.1403 0.0845 ## ## sigma^2 estimated as 0.3417: log likelihood = -165.14, aic = 342.28 ``` ] .panel[ .panel-name[Model estimates in math] - `\(y_t = c + 1.391y_{t-1} -0.581y_{t-2}-1.18 \varepsilon_{t-1}+ 0.558\varepsilon_{t-2}+ \varepsilon_{t}\)` - where `\(c= 0.142\)` - and `\(\varepsilon_t\)` is white noise with a standard deviation of `\(0.585 = \sqrt{0.342}\)`. ] .panel[ .panel-name[Forecasts] ```r fit %>% forecast(h=10) %>% autoplot(include=80) ``` <!-- --> ] ] --- class: middle ## Understanding ARIMA models * If `\(c=0\)` and `\(d=0\)`, the long-term forecasts will go to zero. * If `\(c=0\)` and `\(d=1\)`, the long-term forecasts will go to a non-zero constant. * If `\(c=0\)` and `\(d=2\)`, the long-term forecasts will follow a straight line. * If `\(c\ne0\)` and `\(d=0\)`, the long-term forecasts will go to the mean of the data. * If `\(c\ne0\)` and `\(d=1\)`, the long-term forecasts will follow a straight line. * If `\(c\ne0\)` and `\(d=2\)`, the long-term forecasts will follow a quadratic trend. --- class: middle ## Understanding ARIMA models ### Forecast variance and `\(d\)` * The higher the value of `\(d\)`, the more rapidly the prediction intervals increase in size. * For `\(d=0\)`, the long-term forecast standard deviation will go to the standard deviation of the historical data. ### Cyclic behaviour * For cyclic forecasts, `\(p\ge2\)` and some restrictions on coefficients are required. * If `\(p=2\)`, we need `\(\phi_1^2+4\phi_2<0\)`. Then average length of stochastic cycles is `$$(2\pi)/\left[\text{arc cos}(-\phi_1(1-\phi_2)/(4\phi_2))\right].$$` * This formula has important uses in estimation business and economic cycles. (See Example 2.3 in Tsay (2010)) --- class: middle # Estimation and order selection ## Maximum likelihood estimation - Having identified the model order, we need to estimate the parameters `\(c\)`, `\(\phi_1,\dots,\phi_p\)`,$\theta_1,\dots,\theta_q$. * MLE is very similar to least squares estimation obtained by minimizing `\(\sum_{t-1}^T e_t^2\)` * The `Arima()` command allows CLS or MLE estimation. * Non-linear optimization must be used in either case. * Different software will give different estimates. --- class: middle ## Partial autocorrelations .blockquote[ - Partial autocorrelations} measure relationship between `\(y_{t}\)` and `\(y_{t - k}\)`, when the effects of other time lags `\(1,2, 3, \dots, k - 1\)`are removed. - `\(\alpha_k\)`= `\(k\)`th partial autocorrelation coefficient - `\(\alpha_k\)`{equal to the estimate of `\(b_k\)` in regression: `$$y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \dots + \phi_k y_{t-k}$$` * Varying number of terms on RHS gives `\(\alpha_k\)` for different values of `\(k\)`. * There are more efficient ways of calculating `\(\alpha_k\)`. * `\(\alpha_1=\rho_1\)` * same critical values of `\(\pm 1.96/\sqrt{T}\)` as for ACF. --- class: middle ## Example: US consumption <!-- --> --- class: middle ## ACF and PACF interpretation **AR(1)** `$$rho_k =\phi_1^k \text{ for k=1,2,}\dots$$` `$$\alpha_1= \phi_1 \alpha_k = 0\text{for k=2,3}\dots$$` So we have an AR(1) model when * autocorrelations exponentially decay * there is a single significant partial autocorrelation. --- class: middle ## ACF and PACF interpretation **AR($p$)** * ACF dies out in an exponential or damped sine-wave manner * PACF has all zero spikes beyond the `\(p\)`th spike So we have an AR($p$) model when * the ACF is exponentially decaying or sinusoidal * there is a significant spike at lag `\(p\)` in PACF, but none beyond `\(p\)` ## ACF and PACF interpretation **MA(1)** \begin{align*} \hspace*{1cm}\rho_1 &= \theta_1\qquad \rho_k = 0\qquad\text{for `\(k=2,3,\dots\)`};\\ \alpha_k &= -(-\theta_1)^k \end{align*} So we have an MA(1) model when * the PACF is exponentially decaying and * there is a single significant spike in ACF ## ACF and PACF interpretation **MA($q$)** * PACF dies out in an exponential or damped sine-wave manner * ACF has all zero spikes beyond the `\(q\)`th spike So we have an MA($q$) model when * the PACF is exponentially decaying or sinusoidal * there is a significant spike at lag `\(q\)` in ACF, but none beyond `\(q\)` --- class: middle ## Information criteria \structure{Akaike's Information Criterion (AIC):} \centerline{$\text{AIC} = -2 \log(L) + 2(p+q+k+1),$} where `\(L\)` is the likelihood of the data,\newline `\(k=1\)` if `\(c\ne0\)` and `\(k=0\)` if `\(c=0\)`.\pause\vspace*{0.2cm} \structure{Corrected AIC:} \centerline{$\text{AICc} = \text{AIC} + \frac{2(p+q+k+1)(p+q+k+2)}{T-p-q-k-2}.$}\pause\vspace*{0.2cm} \structure{Bayesian Information Criterion:} \centerline{$\text{BIC} = \text{AIC} + [\log(T)-2](p+q+k-1).$} \pause\vspace*{-0.2cm} \begin{block}{}Good models are obtained by minimizing either the AIC, \text{AICc}\ or BIC\@. My preference is to use the \text{AICc}.\end{block} --- class: middle # Non-stationary times series model ## Random walk with drift? \small \begin{block}{} \centerline{$y_t = 10 + 0.99y_{t-1}+ \varepsilon_t$} \end{block} ```r set.seed(1) autoplot(10 + arima.sim(list(ar =0.99), n = 100)) + ylab("") + ggtitle("Random Walk with Drift?") ``` <img src="data:image/png;base64,#07.linear_time_series_models_files/figure-html/rw_drift-1.png" width="60%" style="display: block; margin: auto;" />